Genome-resolved metagenomics on anaerobic enrichments from deep subsurface groundwaters
Author
george.westmeijer@umu.se
Published
April 2, 2025
Read data and assign colors
The 16S ribosomal RNA data is from several sequencing projects. In chronological order, P15009 contains the groundwater samples from t0 (sampled Oct 2019), while P21015 contains the samples from t24 (sampled Feb 2020) from identical groundwaters. P21015 contains the sequencing libraries from the cultures as part of phase a (one meteoric groundwater), while P26201 contains the libraries from the cultures as part of phase b (three groundwaters as inoculum).
All 16S libraries were processed with the bioinformatic pipeline nf-core/ampliseq (v2.12.0), using the SBDI-GTDB (R09-RS220-1) as a reference.
This project includes 15 metagenomes from 15 unique cultures. The metagenomes contain 35 de-replicated metagenome-assembled genomes (MAG) that are affiliated with the Bacillota (7), Desulfobacterota (7), Pseudomonadota (6), Bacteroidota (4), Patescibacteria (3), Nanoarchaeota (2), Acidobacteriota (1), Campylobacterota (1), Chloroflexota (1), Iainarchaeota (1), Spirochaetota (1), and Verrucomicrobiota (1).
Figure 1: sampling-site
Expand code
coverm <-read_tsv("data/coverm.tsv.gz", col_types =cols(.default =col_character(), tpm =col_double()))emapper <-read_tsv("data/emapper.tsv.gz", show_col_types =FALSE)# Sample 35 (l7f) was removed from the binning process as this sample did not produce any binsbintax <-read_tsv("data/gtdbtk.summary.tsv", col_types =cols(.default =col_character())) quality <-read_tsv("data/quality_report.tsv", show_col_types = F) %>%rename_with(tolower) %>%rename(genome = name) %>%filter(genome %in% coverm$genome)
rare %>%inner_join(smd, by =c("site"="sample")) %>%# One sample has ~ 1 million reads filter(sample <150000) %>%ggplot(aes(sample, species, group = site, colour = origin)) +geom_line(linewidth =0.75) +scale_color_manual(values = cols.origin, name ="Groundwater", labels =c("Meteoric", "Marine", "Saline")) +labs(x ="No. of sequenced reads", y ="Rarefied No. of ASVs") +scale_x_continuous(labels =c("0", "50.000", "100.000", "150.000")) +theme_tidy() +theme(panel.border =element_rect(fill ="transparent", linewidth =0.75, colour ="#000000"), )
Figure 2: Rarefaction curves for the groundwater samples (t0) for each groundwater type (n = 6).
Expand code
ggsave(filename ="figures/rarecurve.pdf", width =100, height =80, units ="mm")
Overview community structure t0
To visualize the community composition, the most abundant phyla are identified while grouping less abundant phyla as “Other”. ASVs that are not identified on phylum level are included as “Unidentified”.
Expand code
t <- seqtab %>%filter(sample %in% t0) %>%inner_join(tax, by ="seqid") %>%group_by(phylum, sample) %>%# Sum the abundance of each phylum within a samplesummarise(relab =sum(relab), .groups ='drop_last') %>%# Calculate the mean abundance of each phylum over the categoriessummarise(mean_relab =sum(relab), .groups ='drop') %>%filter(!is.na(phylum)) %>%top_n(8, mean_relab)t %>%arrange(desc(mean_relab)) %>% knitr::kable()
p3 <- seqtab %>%filter(sample %in% t0) %>%inner_join(taxref, by ="seqid") %>%# Summarize in order to have the sum for each category and topphylumgroup_by(sample, topphylum) %>%summarise(mrelab =sum(relab), .groups ='drop') %>%# Call the plotggplot(aes(x =fct_relevel(sample, t0), y = mrelab *100, fill =fct_relevel(topphylum, names(cols.phylum) %>%rev()) ) ) +labs(x ='', y ='Relative abundance (%)', fill ="") +geom_col() +scale_y_continuous(expand =c(0.01,0), limits =c(-10,101)) +annotate(geom ="segment", x =0.55, xend =3.45, y =-1, yend =-1, linewidth =0.5) +annotate(geom ="segment", x =3.55, xend =6.45, y =-1, yend =-1, linewidth =0.5) +annotate(geom ="segment", x =6.55, xend =9.45, y =-1, yend =-1, linewidth =0.5) +annotate(geom ="segment", x =9.55, xend =12.45, y =-1, yend =-1, linewidth =0.5) +annotate(geom ="segment", x =12.55, xend =15.45, y =-1, yend =-1, linewidth =0.5) +annotate(geom ="segment", x =15.55, xend =18.45, y =-1, yend =-1, linewidth =0.5) +# Labels datesannotate(geom ="text", x =2, y =-3.5, label ="Oct 2019", size =7/.pt) +annotate(geom ="text", x =5, y =-3.5, label ="April 2020", size =7/.pt) +annotate(geom ="text", x =8, y =-3.5, label ="Oct 2019", size =7/.pt) +annotate(geom ="text", x =11, y =-3.5, label ="April 2020", size =7/.pt) +annotate(geom ="text", x =14, y =-3.5, label ="Oct 2019", size =7/.pt) +annotate(geom ="text", x =17, y =-3.5, label ="April 2020", size =7/.pt) +# Labels groundwaterannotate(geom ="text", x =3.5, y =-8, label ="Meteoric", size =7/.pt) +annotate(geom ="text", x =9.5, y =-8, label ="Marine", size =7/.pt) +annotate(geom ="text", x =15.5, y =-8, label ="Saline", size =7/.pt) +scale_fill_manual(values = cols.phylum) +guides(fill =guide_legend(ncol =3, reverse =TRUE)) +theme_tidy(legend ="bottom") +theme(aspect.ratio =0.75,axis.text.x =element_blank(), axis.ticks.x =element_blank(),legend.key.height =unit(3.5, "mm"),legend.margin =margin(t =-20),legend.text =element_text(margin =margin(l =2)),panel.border =element_rect(fill ="transparent", linewidth =0.75),legend.key.spacing.y =unit(0, "mm") )p3
Figure 3: Community composition of the groundwater samples (t0) for each groundwater type (n = 6) before (October 2019) and after (April 2020) collection of groundwater for inoculation. The eight most abundant phyla are shown, grouping less abundant phyla as “Other” and uncharacterized ASVs as “Unidentified”.
As the hydrochemistry data is from a monitoring program, the sampling data closest to groundwater sampling was selected. All units are mg l-1, except for EC (mS m-1), pH, T (degrees C), and O18 (ppt SMOW)
The number of ASVs per sample are estimated with the specnumber() function from the Vegan library.
Expand code
adiv <- seqtab %>%select(-relab) %>%pivot_wider(names_from ="seqid", values_from ="count", values_fill =0) %>%column_to_rownames('sample') %>% vegan::specnumber() %>%as.data.frame() %>%rownames_to_column('sample') %>%#filter(sample %in% t0) %>%rename(specnumber =2)chem <- adiv %>%# Add the species richness for each sampleinner_join(chem, by ="sample")
In order to run the RDA, the absolute counts are standardized using a Hellinger transformation (square root of the relative abundance).
Expand code
hellinger <- seqtab %>%filter(sample %in% t0) %>%# Standard Vegan transformation: Turn table with with samples as *rows*select(sample, seqid, count) %>%pivot_wider(names_from ="seqid", values_from ="count", values_fill =0) %>%# Turn into a numeric matrixcolumn_to_rownames('sample') %>% vegan::decostand(method ="hellinger") %>%rownames_to_column("sample") %>%pivot_longer(-1, names_to ="seqid", values_to ="hellinger")m <- hellinger %>%# Create a matrixspread(seqid, hellinger) %>%column_to_rownames("sample")
The RDA is called while providing a large number of constraining factors, as the RDA function will by default not include redundant variables (Pearson r > 0.9).
Some constraints or conditions were aliased because they were redundant. This
can happen if terms are linearly dependent (collinear): 'ph_lab', 'ec_lab',
'temp_w', 'nh4_n', 'feii'
Expand code
# This one will contain the proportions explained which we get by dividing the# eigenvalue of each component with the sum of eigenvalues for all components.p <-c(rda$CCA$eig, rda$CA$eig)/sum(c(rda$CCA$eig, rda$CA$eig))# This one is collecting the coordinates for arrows that will depict how the # factors in our model point in the coordinatea <- rda$CCA$biplot %>%data.frame() %>% tibble::rownames_to_column('factor')lab <-tibble(factor =c("o18", "doc"),label =c("delta^18*O", "DOC")) %>%inner_join(a, by ="factor")summary(rda)
ggsave("figures/fig-1.pdf", width =18, height =10, units ="cm")
Network analysis: phylum
Expand code
t <-c("Spirochaetota", "Acidobacteriota", "Atribacterota", "Chloroflexota","Omnitrophota", "Nanoarchaeota", "Campylobacterota", "Pseudomonadota")network.survey <-read_tsv("data/network-survey.tsv", show_col_types = F)# Correlation between the abundance of the phyla using the culturesfit <-tibble("phylum"= t, "r.squared"=NA)for (i in t) { model <-lm(relab ~ cpr, data = network.survey %>%filter(phylum == i)) %>%summary() fit[fit$phylum == i, "r.squared"] <- model$adj.r.squared %>%round(digits =2)}fit
t <-c("Omnitrophota", "Chloroflexota", "Pseudomonadota", "Actinomycetota","Verrucomicrobiota", "Acidobacteriota", "Bacillota", "Spirochaetota")network <- seqtab %>%inner_join(smd, by ="sample") %>%filter(fraction =="non-fractionated") %>%inner_join(tax, by ="seqid") %>%filter(phylum %in% t) %>%group_by(sample, phylum) %>%summarise(relab =sum(relab), .groups ="drop") network <- seqtab %>%inner_join(tax, by ="seqid") %>%filter(phylum =="Patescibacteria") %>%group_by(sample, phylum) %>%summarise(cpr =sum(relab), .groups ="drop") %>%select(-phylum) %>%inner_join(network, by ="sample")# Correlation between the abundance of the phyla using the culturesfit <-tibble("phylum"= t, "r.squared"=NA)for (i in t) { model <-lm(relab ~ cpr, data = network %>%filter(phylum == i)) %>%summary() fit[fit$phylum == i, "r.squared"] <- model$adj.r.squared %>%round(digits =2)}fit <- fit %>%arrange(desc(r.squared))fit
ggsave("figures/fig-s4.pdf", width =140, height =160, units ="mm")
16S data cultures
NMDS cultures three groundwaters (supplemental)
Expand code
set.seed(999)nmds <- seqtab %>%# Full size fractioninner_join(smd, by ="sample") %>%filter(!sample %in%paste("P21015_10", seq(42,81), sep ="")) %>%# Data from bachelor projectfilter(fraction =="non-fractionated"| fraction =="environmental") %>%group_by(seqid) %>%filter(count >2) %>%ungroup() %>%select(seqid, sample, count) %>%spread(seqid, count, fill =0) %>%column_to_rownames("sample") %>% vegan::metaMDS()
Square root transformation
Wisconsin double standardization
Run 0 stress 0.1923165
Run 1 stress 0.2120395
Run 2 stress 0.1923165
... Procrustes: rmse 4.071803e-06 max resid 1.411655e-05
... Similar to previous best
Run 3 stress 0.1923165
... Procrustes: rmse 4.97049e-06 max resid 1.741608e-05
... Similar to previous best
Run 4 stress 0.2227196
Run 5 stress 0.2150739
Run 6 stress 0.2302776
Run 7 stress 0.2093082
Run 8 stress 0.2093082
Run 9 stress 0.2247649
Run 10 stress 0.1923165
... Procrustes: rmse 9.752949e-06 max resid 3.745239e-05
... Similar to previous best
Run 11 stress 0.2236037
Run 12 stress 0.2215293
Run 13 stress 0.2213196
Run 14 stress 0.2142431
Run 15 stress 0.1923165
... Procrustes: rmse 7.963952e-06 max resid 2.772181e-05
... Similar to previous best
Run 16 stress 0.1923165
... Procrustes: rmse 1.028341e-05 max resid 4.281696e-05
... Similar to previous best
Run 17 stress 0.2120395
Run 18 stress 0.1923165
... Procrustes: rmse 6.422786e-06 max resid 2.085941e-05
... Similar to previous best
Run 19 stress 0.2154001
Run 20 stress 0.1923165
... New best solution
... Procrustes: rmse 6.318389e-06 max resid 2.826773e-05
... Similar to previous best
*** Best solution repeated 1 times
Expand code
vegan::scores(nmds)$sites %>%as.data.frame() %>%rownames_to_column("sample") %>%inner_join(smd, by ="sample") -> nmds.scores
nmds <- seqtab %>%# Small size fractioninner_join(smd, by ="sample") %>%filter(!sample %in%paste("P21015_10", seq(42,81), sep ="")) %>%# Data from bachelor projectfilter(fraction =="0.1 - 0.45 µm"| fraction =="environmental") %>%group_by(seqid) %>%filter(count >2) %>%ungroup() %>%select(seqid, sample, count) %>%spread(seqid, count, fill =0) %>%column_to_rownames("sample") %>% vegan::metaMDS()
Square root transformation
Wisconsin double standardization
Run 0 stress 0.1068401
Run 1 stress 0.2760336
Run 2 stress 0.10684
... New best solution
... Procrustes: rmse 3.267484e-05 max resid 9.142731e-05
... Similar to previous best
Run 3 stress 0.1068401
... Procrustes: rmse 8.950307e-05 max resid 0.0003677584
... Similar to previous best
Run 4 stress 0.10684
... Procrustes: rmse 2.829675e-05 max resid 0.0001164834
... Similar to previous best
Run 5 stress 0.10684
... New best solution
... Procrustes: rmse 5.316658e-05 max resid 0.0002191173
... Similar to previous best
Run 6 stress 0.1068401
... Procrustes: rmse 0.0001439016 max resid 0.0005893127
... Similar to previous best
Run 7 stress 0.1068401
... Procrustes: rmse 0.000100918 max resid 0.0004154544
... Similar to previous best
Run 8 stress 0.10684
... Procrustes: rmse 1.307685e-05 max resid 5.27817e-05
... Similar to previous best
Run 9 stress 0.10684
... Procrustes: rmse 8.745294e-05 max resid 0.0003599189
... Similar to previous best
Run 10 stress 0.10684
... Procrustes: rmse 1.398977e-05 max resid 5.399632e-05
... Similar to previous best
Run 11 stress 0.1068401
... Procrustes: rmse 0.000182129 max resid 0.0007494872
... Similar to previous best
Run 12 stress 0.10684
... Procrustes: rmse 1.292282e-05 max resid 4.180245e-05
... Similar to previous best
Run 13 stress 0.1068402
... Procrustes: rmse 0.0001867016 max resid 0.0007641379
... Similar to previous best
Run 14 stress 0.10684
... Procrustes: rmse 6.740886e-05 max resid 0.0002775373
... Similar to previous best
Run 15 stress 0.10684
... New best solution
... Procrustes: rmse 1.911565e-05 max resid 7.683967e-05
... Similar to previous best
Run 16 stress 0.10684
... Procrustes: rmse 3.965406e-05 max resid 0.0001563627
... Similar to previous best
Run 17 stress 0.10684
... Procrustes: rmse 1.868823e-05 max resid 7.486526e-05
... Similar to previous best
Run 18 stress 0.1068401
... Procrustes: rmse 0.0001394823 max resid 0.0005685175
... Similar to previous best
Run 19 stress 0.1068401
... Procrustes: rmse 0.000156072 max resid 0.0006370961
... Similar to previous best
Run 20 stress 0.10684
... Procrustes: rmse 1.551736e-05 max resid 6.113502e-05
... Similar to previous best
*** Best solution repeated 6 times
Expand code
vegan::scores(nmds)$sites %>%as.data.frame() %>%rownames_to_column("sample") %>%inner_join(smd, by ="sample") -> nmds.scores
Square root transformation
Wisconsin double standardization
Run 0 stress 0.1203684
Run 1 stress 0.1448066
Run 2 stress 0.1374292
Run 3 stress 0.1212802
Run 4 stress 0.1331208
Run 5 stress 0.1230642
Run 6 stress 0.13485
Run 7 stress 0.14842
Run 8 stress 0.133246
Run 9 stress 0.1408211
Run 10 stress 0.1212186
Run 11 stress 0.1440819
Run 12 stress 0.1336567
Run 13 stress 0.1400868
Run 14 stress 0.1447862
Run 15 stress 0.1504146
Run 16 stress 0.1212186
Run 17 stress 0.1342165
Run 18 stress 0.1305863
Run 19 stress 0.133246
Run 20 stress 0.1476626
*** Best solution was not repeated -- monoMDS stopping criteria:
14: stress ratio > sratmax
6: scale factor of the gradient < sfgrmin
Expand code
nmds.scores <- vegan::scores(nmds)$sites %>%as.data.frame() %>%rownames_to_column("sample") %>%inner_join(smd, by ="sample")
Expand code
# 'i' contains the samples that will be included in panel ci <-c("P21015_1045", "P21015_1049", "P21015_1053", "P21015_1065", "P21015_1069", "P21015_1077", # Lysate"P21015_1043", "P21015_1047", "P21015_1051", "P21015_1055", "P21015_1075", "P21015_1079"# Acetate)i <- smd %>%filter(grepl("P21", sample)) %>%filter(fraction =="fractionated") %>%arrange(medium) %>%pull(sample)p2 <- nmds.scores %>%mutate(medium =factor(medium, levels =c("ace","lys","env"))) %>%ggplot(aes(x = NMDS1, y = NMDS2, color = fraction, shape = medium,fill = fraction ) ) +geom_vline(xintercept =0, linetype ='dotted') +geom_hline(yintercept =0, linetype ='dotted') +geom_point(size =4.0, stroke =0.75) +geom_text(data = nmds.scores, aes(label = age), size =6/.pt, color ="black", na.rm =TRUE) +scale_shape_manual(values =c("ace"=23, "lys"=24, "env"=21), labels =c("Acetate", "Lysate", "Environmental")) +scale_fill_manual(values =alpha(cols.fraction, alpha =0.2), labels =c("Environmental", "< 0.45 µm", "All cells")) +scale_color_manual(values = cols.fraction, labels =c("Environmental", "0.1 - 0.45 µm", "All cells")) +annotate('text', x =Inf, y =-Inf, size =6/.pt, hjust =1.05, vjust =-0.5, label =paste('Stress = ', round(nmds$stress, digits =2))) +theme_tidy(legend ="bottom", fontsize =6) +theme(legend.title =element_blank(),legend.box ="vertical",legend.spacing.x =unit(0, "mm"),legend.box.margin =margin(),legend.margin =margin(),legend.text =element_text(margin =margin(l =2, r =8, unit ="pt")),legend.key.spacing.x =unit(0, "mm") )
Expand code
cols.phylum <-c("Pseudomonadota"="#972D15","Spirochaetota"="#D8B70A","Patescibacteria"="#C7B19C","Nanoarchaeota"="#046C9A","Omnitrophota"="#00A08A","Desulfobacterota"="#A2A475","Acidobacteriota"="#FAEFD1" ) p3 <- seqtab %>%filter(sample %in% i) %>%inner_join(tax, by ="seqid") %>%filter(phylum %in%names(cols.phylum)) %>%group_by(sample, phylum) %>%summarise(n =sum(relab) *100, .groups ="drop") %>%# Order the samples and the phylamutate(phylum =factor(phylum, levels =names(cols.phylum))) %>%mutate(sample =factor(sample, levels = i)) %>%ggplot(aes(x = sample,y = n,fill =fct_rev(phylum) )) +geom_col() +scale_fill_manual(values = cols.phylum) +scale_y_continuous(limits =c(NA, 100)) +geom_text(data = smd %>%filter(sample %in% i) %>%mutate(sample =factor(sample, i)),aes(x = sample, y =3, label = age), size =6/.pt, inherit.aes =FALSE ) +labs(x ="", y ="Relative abundance (%)") +theme_tidy(legend ="bottom", fontsize =6) +guides(fill =guide_legend(ncol =2, reverse =TRUE)) +annotate("segment", x =0.5, xend =10.45, y =-2, linewidth =0.5) +annotate("text", x =5.5, y =-5, label ="Acetate", size =6/.pt) +annotate("segment", x =10.55, xend =20.45, y =-2, linewidth =0.5) +annotate("text", x =15.5, y =-5, label ="Lysate", size =6/.pt) +theme(axis.text.x =element_blank(),axis.ticks.x =element_blank(),legend.title =element_blank(),legend.key.width =unit(3.5, "mm"), legend.key.height =unit(4.5, "mm"),legend.text =element_text(margin =margin(l =2)),legend.key.spacing.y =unit(0.2, "mm"),legend.margin =margin(t =-15) )
ggsave(filename ="figures/fig-2.pdf", width =14, height =14, units ="cm")
Barplot KR0015
Expand code
t <- seqtab %>%filter(sample %in%paste("P21015", seq(1042,1081), sep ="_") ) %>%inner_join(tax, by ="seqid") %>%group_by(phylum, sample) %>%# Sum the abundance of each phylum within a samplesummarise(relab =sum(relab), .groups ='drop_last') %>%# Calculate the mean abundance of each phylum over the categoriessummarise(mean_relab =sum(relab), .groups ='drop') %>%filter(!is.na(phylum)) %>%top_n(9, mean_relab) %>%filter(phylum !="Bacteroidota")# The most abundant phyla, averaged over all culturest %>%arrange(desc(mean_relab))
seqtab %>%filter(sample %in%paste("P21015", seq(1042,1081), sep ="_") ) %>%inner_join(taxref, by ="seqid") %>%# Summarize in order to have the sum for each category and topphylumgroup_by(sample, topphylum) %>%summarise(relab =sum(relab), .groups ='drop') %>%inner_join(smd, by ="sample") %>%mutate(age =as.numeric(.$age)) %>%mutate(fraction =case_when( fraction =="non-fractionated"& medium =="ace"~"Acetate, all cells", fraction =="non-fractionated"& medium =="lys"~"Lysate, all cells", fraction =="fractionated"& medium =="ace"~"Acetate, < 0.45 µm fraction", fraction =="fractionated"& medium =="lys"~"Lysate, < 0.45 µm fraction" )) %>%mutate(topphylum =factor(topphylum, levels =rev(names(cols.phylum)))) %>%# Call the plotggplot(aes( age, relab *100, fill = topphylum)) +geom_col() +scale_x_continuous(n.breaks =10) +facet_wrap(~ fraction, nrow =1) +labs(x ='Week', y ='Relative abundance (%)', fill ='') +scale_fill_manual(values = cols.phylum) +guides(fill =guide_legend(reverse =TRUE, ncol =3)) +theme_tidy(legend ="bottom") +theme(aspect.ratio =1.2,panel.border =element_rect(fill ="transparent", linewidth =0.5),strip.background =element_blank(), legend.background =element_blank(),legend.key.height =unit(4, "mm"),legend.key.spacing.y =unit(0, "mm"),plot.margin =margin(),axis.ticks.length =unit(0.5, "mm"),legend.box.margin =margin(t =-5) )
Expand code
ggsave("figures/fig-s3.pdf", height =9, width =18, units ="cm")
Barplot KR0015 - SA1420 - SA2600
Expand code
t <- seqtab %>%inner_join(smd, by ="sample") %>%filter(groundwater =="KR0015") %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%group_by(phylum, sample) %>%# Sum the abundance of each phylum within a samplesummarise(relab =sum(relab), .groups ='drop_last') %>%# Calculate the mean abundance of each phylum over the categoriessummarise(mean_relab =sum(relab), .groups ='drop') %>%top_n(8, mean_relab) %>%arrange(desc(mean_relab)) %>%pull(phylum)
p1 <- seqtab %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%mutate(phylum =if_else(phylum %in% t, phylum, "Other")) %>%# Summarize in order to have the sum for each category and topphylumgroup_by(sample, phylum) %>%summarise(relab =sum(relab), .groups ='drop') %>%inner_join(smd, by ="sample") %>%filter(groundwater =="KR0015") %>%mutate(phylum =factor(phylum, levels =names(rev(cols.phylum)))) %>%mutate(fraction =case_when( fraction =="fractionated"& medium =="lys"~"Fractionated: lysate", fraction =="fractionated"& medium =="ace"~"Fractionated: acetate", fraction =="non-fractionated"& medium =="lys"~"All cells: lysate", fraction =="non-fractionated"& medium =="ace"~"All cells: acetate", )) %>%# Call the plotggplot(aes( sample, relab *100, fill = phylum)) +geom_col(show.legend =TRUE) +facet_wrap(~ fraction, scales ="free_x", nrow =1) +labs(x ="", y ='Relative abundance (%)', fill ='') +scale_fill_manual(values = cols.phylum, drop =FALSE) +theme_tidy(legend ="bottom") +guides(fill =guide_legend(nrow =3, reverse =TRUE)) +theme(axis.text.x =element_blank(),axis.ticks.x =element_blank(),legend.key.width =unit(3.5, "mm"),legend.text =element_text(margin =margin(l =2)),legend.margin =margin(t =-10) )
Expand code
t <- seqtab %>%inner_join(smd, by ="sample") %>%filter(groundwater =="SA1420") %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%group_by(phylum, sample) %>%# Sum the abundance of each phylum within a samplesummarise(relab =sum(relab), .groups ='drop_last') %>%# Calculate the mean abundance of each phylum over the categoriessummarise(mean_relab =sum(relab), .groups ='drop') %>%top_n(8, mean_relab) %>%arrange(desc(mean_relab)) %>%pull(phylum)
Expand code
p2 <- seqtab %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%mutate(phylum =if_else(phylum %in% t, phylum, "Other")) %>%# Summarize in order to have the sum for each category and topphylumgroup_by(sample, phylum) %>%summarise(relab =sum(relab), .groups ='drop') %>%inner_join(smd, by ="sample") %>%filter(groundwater =="SA1420") %>%mutate(phylum =factor(phylum, levels =names(rev(cols.phylum)))) %>%mutate(fraction =case_when( fraction =="fractionated"& medium =="lys"~"Fractionated: lysate", fraction =="fractionated"& medium =="ace"~"Fractionated: acetate", fraction =="non-fractionated"& medium =="lys"~"All cells: lysate", fraction =="non-fractionated"& medium =="ace"~"All cells: acetate", )) %>%# Call the plotggplot(aes( sample, relab *100, fill = phylum)) +geom_col(show.legend =TRUE) +facet_wrap(~ fraction, scales ="free_x", nrow =1) +labs(x ="", y ='Relative abundance (%)', fill ='') +scale_fill_manual(values = cols.phylum, drop =FALSE) +theme_tidy(legend ="bottom") +guides(fill =guide_legend(nrow =3, reverse =TRUE)) +theme(axis.text.x =element_blank(),axis.ticks.x =element_blank(),legend.key.width =unit(3.5, "mm"),legend.text =element_text(margin =margin(l =2)),legend.margin =margin(t =-10) )
Expand code
t <- seqtab %>%inner_join(smd, by ="sample") %>%filter(groundwater =="SA2600") %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%group_by(phylum, sample) %>%# Sum the abundance of each phylum within a samplesummarise(relab =sum(relab), .groups ='drop_last') %>%# Calculate the mean abundance of each phylum over the categoriessummarise(mean_relab =sum(relab), .groups ='drop') %>%top_n(8, mean_relab) %>%arrange(desc(mean_relab)) %>%pull(phylum)
Expand code
p3 <- seqtab %>%filter(grepl("P26", sample)) %>%inner_join(tax, by ="seqid") %>%mutate(phylum =if_else(phylum %in% t, phylum, "Other")) %>%# Summarize in order to have the sum for each category and topphylumgroup_by(sample, phylum) %>%summarise(relab =sum(relab), .groups ='drop') %>%inner_join(smd, by ="sample") %>%filter(groundwater =="SA2600") %>%mutate(phylum =factor(phylum, levels =names(rev(cols.phylum)))) %>%mutate(fraction =case_when( fraction =="fractionated"& medium =="lys"~"Fractionated: lysate", fraction =="fractionated"& medium =="ace"~"Fractionated: acetate", fraction =="non-fractionated"& medium =="lys"~"All cells: lysate", fraction =="non-fractionated"& medium =="ace"~"All cells: acetate", )) %>%# Call the plotggplot(aes( sample, relab *100, fill = phylum)) +geom_col(show.legend =TRUE) +facet_wrap(~ fraction, scales ="free_x", nrow =1) +labs(x ="", y ='Relative abundance (%)', fill ='') +scale_fill_manual(values = cols.phylum, drop =FALSE) +theme_tidy(legend ="bottom") +guides(fill =guide_legend(nrow =3, reverse =TRUE)) +theme(axis.text.x =element_blank(),axis.ticks.x =element_blank(),legend.key.width =unit(3.5, "mm"),legend.text =element_text(margin =margin(l =2)),legend.margin =margin(t =-10) )
lm(coding_density ~ genome_size, data = quality) %>%summary()
Call:
lm(formula = coding_density ~ genome_size, data = quality)
Residuals:
Min 1Q Median 3Q Max
-0.05556 -0.01303 -0.00046 0.01275 0.03787
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.172e-01 8.112e-03 113.064 < 2e-16 ***
genome_size -6.991e-09 2.275e-09 -3.072 0.00424 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02103 on 33 degrees of freedom
Multiple R-squared: 0.2224, Adjusted R-squared: 0.1989
F-statistic: 9.439 on 1 and 33 DF, p-value: 0.004236
Expand code
p3 <- i %>%ggplot(aes( genome.size, coding_density, color = phylum,fill = phylum )) +geom_smooth(method ="lm", formula = y ~ x, colour ="black", fill ="#d9d9d9", se =TRUE, linewidth =0.4) +geom_point(size =2.5, shape =21, stroke =0.5) +labs(x ="Estimated genome size (Mb)", y ="Estimated coding density", color ="") +scale_colour_manual(values = cols.mag, guide ="none") +scale_fill_manual(values =alpha(cols.mag, alpha =0.5), guide ="none") +theme_tidy() +scale_x_continuous(breaks =c(1,2,3,4,5,6)) +scale_y_continuous(n.breaks =7) +annotate("text", x =Inf, y =Inf, label ="paste(R ^ 2, \" = 0.20\")", size =7/.pt, vjust =1, hjust =1, parse =TRUE) +theme(axis.line =element_line(),panel.border =element_blank() )
Expand code
nmds <- coverm %>%# Sum the abundance of both coverage per metagenomemutate(abundance =1) %>%inner_join(emapper, by ="genome", relationship ="many-to-many") %>%# Filter out NA and add multiple KOs on new linefilter(kegg_ko !="-") %>%separate_rows(kegg_ko, sep =",") %>%# Sum the abundance of each KO within the metagenomesgroup_by(genome, kegg_ko) %>%summarise(abundance =sum(abundance), .groups ="drop") %>%# Pivot to prepare a numerical matrixpivot_wider(names_from ="kegg_ko", values_from ="abundance", values_fill =0) %>%column_to_rownames("genome") %>% vegan::metaMDS()
Square root transformation
Wisconsin double standardization
Run 0 stress 0.06277184
Run 1 stress 0.06660901
Run 2 stress 0.06506398
Run 3 stress 0.06263043
... New best solution
... Procrustes: rmse 0.002502074 max resid 0.01128171
Run 4 stress 0.0661089
Run 5 stress 0.07059065
Run 6 stress 0.06647542
Run 7 stress 0.06601262
Run 8 stress 0.06459214
Run 9 stress 0.06898965
Run 10 stress 0.0650395
Run 11 stress 0.06277191
... Procrustes: rmse 0.002517934 max resid 0.01114544
Run 12 stress 0.06539869
Run 13 stress 0.06277223
... Procrustes: rmse 0.002536871 max resid 0.01104603
Run 14 stress 0.06277183
... Procrustes: rmse 0.002499626 max resid 0.01127226
Run 15 stress 0.06263054
... Procrustes: rmse 7.567288e-05 max resid 0.0002464256
... Similar to previous best
Run 16 stress 0.06660889
Run 17 stress 0.06360088
Run 18 stress 0.06299217
... Procrustes: rmse 0.0781651 max resid 0.1490143
Run 19 stress 0.06286022
... Procrustes: rmse 0.0781924 max resid 0.1486652
Run 20 stress 0.0660097
*** Best solution repeated 1 times